{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Regression\n\nThe following example shows how to fit a simple regression model with\n*auto-sklearn*.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from pprint import pprint\n\nimport sklearn.datasets\nimport sklearn.metrics\n\nimport autosklearn.regression\nimport matplotlib.pyplot as plt"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Data Loading\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "X, y = sklearn.datasets.load_diabetes(return_X_y=True)\n\nX_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(\n    X, y, random_state=1\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Build and fit a regressor\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "automl = autosklearn.regression.AutoSklearnRegressor(\n    time_left_for_this_task=120,\n    per_run_time_limit=30,\n    tmp_folder=\"/tmp/autosklearn_regression_example_tmp\",\n)\nautoml.fit(X_train, y_train, dataset_name=\"diabetes\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## View the models found by auto-sklearn\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(automl.leaderboard())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Print the final ensemble constructed by auto-sklearn\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "pprint(automl.show_models(), indent=4)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Get the Score of the final ensemble\nAfter training the estimator, we can now quantify the goodness of fit. One possibility for\nis the [R2 score](https://scikit-learn.org/stable/modules/model_evaluation.html#r2-score).\nThe values range between -inf and 1 with 1 being the best possible value. A dummy estimator\npredicting the data mean has an R2 score of 0.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "train_predictions = automl.predict(X_train)\nprint(\"Train R2 score:\", sklearn.metrics.r2_score(y_train, train_predictions))\ntest_predictions = automl.predict(X_test)\nprint(\"Test R2 score:\", sklearn.metrics.r2_score(y_test, test_predictions))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Plot the predictions\nFurthermore, we can now visually inspect the predictions. We plot the true value against the\npredictions and show results on train and test data. Points on the diagonal depict perfect\npredictions. Points below the diagonal were overestimated by the model (predicted value is higher\nthan the true value), points above the diagonal were underestimated (predicted value is lower than\nthe true value).\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "plt.scatter(train_predictions, y_train, label=\"Train samples\", c=\"#d95f02\")\nplt.scatter(test_predictions, y_test, label=\"Test samples\", c=\"#7570b3\")\nplt.xlabel(\"Predicted value\")\nplt.ylabel(\"True value\")\nplt.legend()\nplt.plot([30, 400], [30, 400], c=\"k\", zorder=0)\nplt.xlim([30, 400])\nplt.ylim([30, 400])\nplt.tight_layout()\nplt.show()"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.13"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}